Release from Active Learning/Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time

نویسندگان

  • Masashi Sugiyama
  • Hidemitsu Ogawa
چکیده

In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of the generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of the generalization capability is expected. We call this problem active learning with model selection. However, this problem can not be generally solved by simply combining existing active learning and model selection techniques because of the active learning/model selection dilemma: the model should be fixed for selecting sample points and conversely the sample points should be fixed for selecting models. In spite of the dilemma, we show that the problem of active learning with model selection can be straightforwardly solved if there is a set of sample points that is optimal for all models in consideration. Based on the idea, we give a procedure for active learning with model selection in trigonometric polynomial models. I. Supervised Learning and Active Learning/Model Selection Dilemma Let us consider the supervised learning problem of obtaining, from a set of M training examples, an approximation to a target function f(x) of L variables defined on D, where D is a subset of the L-dimensional Euclidean space R. The training examples are made up of sample points xm in D and corresponding sample values ym in C: {(xm, ym) | ym = f(xm) + m}m=1, (1) where ym is degraded by additive noise m. The purpose of supervised learning is to find a learning result function f̂ (x) that minimizes a certain generalization error JG. In supervised learning, there are two factors we can control for optimal generalization: sample points and a model. The model refers to, for example, the type and number of basis functions used for learning. The problem of designing sample points is called active learning, and the problem of determining the model is called model selection. Let us denote a set ofM sample points {xm}m=1 by X , amodel by S, and a set of models from which the model is selected by M. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of the generalization capability is expected. We call this problem active learning with model selection. Definition 1: (Active learning with model selection) Determine sample points X and select a model from a set M so that the generalization error JG is minimized: min X , S∈M JG[X , S]. (2) In general, the model should be fixed for active learning [4, 7, 3, 6, 5, 14, 15, 18] , and conversely the training examples gathered at fixed sample points are required for model selection [8, 1, 13, 12, 11, 2, 16, 17]. This implies that the problem of active learning with model selection can not be generally solved by simply combining existing active learning and model selection techniques. We call this the active learning/model selection dilemma. In this paper, we suggest a basic strategy for solving this dilemma, and give a practical procedure for active learning with model selection in trigonometric polynomial models. II. Basic Strategy As we pointed out in Section I, the problem of active learning with model selection can not be generally solved by simply combining existing active learning and model selection techniques because of the active learning/model selection dilemma: the model should be fixed for active learning and conversely sample points should be fixed for model selection. However, if there is a set X of sample points that is optimal for all models in the set M, the problem of 1Some of the methods are incremental active learning methods so it is possible to change the model through the incremental learning process. However, such active learning methods essentially work for a fixed model, i.e., the sample points are designed to be optimal for the current model. ] min arg 1 1 S , X G J C X [ = } , , { 3 2 1 S S S = M ] min arg 2 2 S , X G J C X [ = ] min arg 3 3 S , X G J C X [ = ] , min arg S X G M J C X [ =

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning with Model Selection — Simultaneous Optimization of Sample Points and Models for Trigonometric Polynomial Models

In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of the generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of the generalization capability is expected. We call this problem active learning wit...

متن کامل

Active Learning with Model Selection for Optimal Generalization

Abstract: In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of generalization capability is expected. We call this problem active learning w...

متن کامل

Coping with Active Learning with Model Selection Dilemma: Minimizing Expected Generalization Error

Optimally designing the location of training input points (active learning) and choosing the best model (model selection) are two important ingredients of supervised learning and have been studied extensively. However, these two issues seem to have been investigated separately as two independent problems. If training input points and models are simultaneously optimized, the generalization perfo...

متن کامل

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...

متن کامل

Active Learning with Model Selection in Linear Regression

Optimally designing the location of training input points (active learning) and choosing the best model (model selection) are two important components of supervised learning and have been studied extensively. However, these two issues seem to have been investigated separately as two independent problems. If training input points and models are simultaneously optimized, the generalization perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002